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Abstract. Differential privacy is a notion that has emerged in the com- 
munity of statistical databases, as a response to the problem of protect- 
ing the privacy of the database's participants when performing statistical 
queries. The idea is that a randomized query satisfies difi^erential privacy 
if the likelihood of obtaining a certain answer for a database x is not too 
different from the likelihood of obtaining the same answer on adjacent 
databases, i.e. databases which differ from x for only one individual. 
Information flow is an area of Security concerned with the problem of 
controlling the leakage of confidential information in programs and pro- 
tocols. Nowadays, one of the most established approaches to quantify 
and to reason about leakage is based on the Renyi min entropy version 
of information theory. 

In this paper, we analyze critically the notion of differential privacy in 
light of the conceptual framework provided by the Renyi min informa- 
tion theory. We show that there is a close relation between differential 
privacy and leakage, due to the graph symmetries induced by the adja- 
cency relation. Furthermore, we consider the utility of the randomized 
answer, which measures its expected degree of accuracy. We focus on 
certain kinds of utility functions called "binary" , which have a close cor- 
respondence with the Renyi min mutual information. Again, it turns out 
that there can be a tight correspondence between differential privacy 
and utility, depending on the symmetries induced by the adjacency re- 
lation and by the query. Depending on these symmetries we can also 
build an optimal-utility randomization mechanism while preserving the 
required level of differential privacy. Our main contribution is a study 
of the kind of structures that can be induced by the adjacency relation 
and the query, and how to use them to derive bounds on the leakage and 
achieve the optimal utility. 

1 Introduction 

Databases are commonly used for obtaining statistical information about their 
participants. Simple examples of statistical queries are, for instance, the pre- 
dominant disease of a certain population, or the average salary. The fact that 
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the answer is publicly available, however, constitutes a threat for the privacy of 
the individuals. 

In order to illustrate the problem, consider a set of individuals Ind whose 
attribute of intercs10 has values in Val. A particular database is formed by a 
subset of Ind, where a certain value in Val is associated to each participant. A 
query is a function f : X ^ y, where X is the set of all possible databases, and 
y is the domain of the answers. 

For example, let Val be the set of possible salaries and let / represent the 
query "what is the average salary of the participants in the database" . In prin- 
ciple we would like to consider the global information relative to a database x as 
public, and the individual information about a participant i as private. Namely, 
we would like to be able to obtain f{x) without being able to infer the salary 
of i. However, this is not always possible. In particular, if the number of par- 
ticipants in X is known (say n), then the removal of i from the database would 
allow to infer i's salary by querying again the new database x' , and by applying 
the formula n f{x) — (n — 1) f{x'). Using an analogous reasoning we can argue 
that not only the removal, but also the addition of an individual is a threat for 
his privacy. 

Another kind of private information we may want to protect is whether an 
individual i is participating or not in a database. In this case, if we know for 
instance that i earns, say 5K Euros/month, and all the other individuals in Ind 
earn less than AK Euros/month, then knowing that f{x) > 5K Euros/month 
will reveal immediately that i is in the database x. 

A common solution to the above problems is to introduce some output per- 
turbation mechanism based on randomization: instead of the exact answer f{x) 
we report a "noisy" answer. Namely, we use some randomized function K, which 
produces values in some domair0 Z according to some probability distribution 
that depends on the input x G X. Oi course for certain distributions it may still 
be possible to guess the value of an individual with a high probability of success. 
The notion of differential privacy, due to Dwork |10ll3lllll2j . is a proposal to 
control the risk of violating privacy for both kinds of threats described above 
(value and participation). The idea is to say that K, satisfies e-differential pri- 
vacy (for some e > 0) if the ratio between the probabilities that two adjacent 
databases give the same answer is bound by e*^, where by "adjacent" we mean 
that the databases differ for only one individual (either for the value of an in- 
dividual or for the presence/absence of an individual). Often we will abbreviate 
"e-differential privacy" as e-d.p. 

Obviously, the smaller is e, the greater is the privacy protection. In particu- 
lar, when e is close to the output of IC is nearly independent from the input 
(all distributions are almost equal). Unfortunately, such IC is practically useless. 
The utility, i.e. the capability to retrieve accurate answers from the reported 

^ In general we could be interested in several attributes simultaneously, and in this 

case Val would be a set of tuples. 
^ The new domain Z may coincide with y, but not necessarily. It depends on how the 

randomization mechanism is defined. 
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ones, is the other important characteristic of /C, and it is clear that there is a 
trade-ofF between utihty and privacy. On the other hand, these two notions are 
not the complete opposite of each other, because utility concerns the relation 
between the reported answer and the real answer, while privacy is concerns the 
relation between the reported answer and the information in the database. This 
asymmetry makes more interesting the problem of finding a good compromise 
between the two. 

At this point, we would like to remark an intriguing analogy between the area 
of differential privacy and that of quantitative information flow (QIF), both in 
the motivations and in the basic conceptual framework. Information fiow is con- 
cerned with the leakage of secret information through computer systems, and 
the attribute "quantitative" refers to the fact that we are interested in measur- 
ing the amount of leakage, not just its occurrence. One of the most established 
approaches to QIF is based on information theory: the idea is that a system 
is seen as a channel in the information-theoretic sense, where the secret is the 
input and the observables are the output. The entropy of the input represents 
its vulnerability, i.e. how easy it is for an attacher to guess the secret. We distin- 
guish between the a priori entropy (before the observable) and the a posteriori 
entropy (given the observable) . The difference between the two gives the mutual 
information and represents, intuitively, the increase in vulnerability due to the 
observables produced by the system, so it is naturally considered as a measure 
of the leakage. The notion of entropy is related to the kind of attack we want to 
model, and in this paper we focus on the Renyi min entropy [18j . which repre- 
sents the so-called one-try attacks. In recent years there has been a lot of research 
aimed at establishing the foundations of this framework jl9l7ll6l5I5] . It is worth 
pointing out that the a posteriori Renyi min entropy corresponds to the concept 
of Bayes risk, which has also been proposed as a measure of the effectiveness of 
attacks |8|6|17| . 

The analogy hinted above between differential privacy and QIF is based on 
the following observations: at the motivational level, the concern about privacy 
is akin the concern about information leakage. At the conceptual level, the ran- 
domized function /C can be seen as an information-theoretic channel, and the 
limit case of e = 0. for which the privacy protection is total, corresponds to a 
0-capacity channeo (the rows of the channel matrix arc all identical), which docs 
not allow any leakage. Another promising similarity is that the notion of utility 
(in the binary case) corresponds closely to the Bayes risk. 

In this paper we investigate the notion of differential privacy, and its impli- 
cations, in light of the min-entropy information theoretic framework developed 
for QIF. In particular, we wish to explore the following natural questions: 

1. Docs e-d.p. induce a bound on the information leakage of /C? 

2. Does e-d.p. induce a bound on the information leakage relative to an indi- 
vidual? 

3. Does e-d.p. induce a bound on the utility? 

^ The channel capacity is the maximum mutual information over all possible input 
distributions. 
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4. Given / and e, can we construct a K. which satisfies e-d.p. and maximum 
utihty? 

We will see that the answers to (1) and (2) are positive, and we provide bounds 
that are tight, in the sense that for every e there is a whose leakage reaches 
the bound. For (3) we are able to give a tight bound in some cases which depend 
on the structure of the query, and for the same cases, we are able to construct 
an obliviouflfl K, with maximum utility, as requested by (4) . 

Part of the above results have already appeared in [T], and are based on 
techniques which exploit the graph structure that the adjacency relation induces 
on the domain of all databases and on the domain of the correct answers y. 
The main contribution of this paper is an extension of those techniques, and a 
coherent graph-theoretic framework for reasoning about the symmetries of those 
domains. More specifically: 

— We explore the graph-theoretic foundations of the adjacency relation, and 
point out various types of symmetries which allow us to establish a strict 
link between differential privacy and information leakage. 

— We give a tight bound for the question (2) above, strictly smaller than the 
one in [1]. 

— We extend the structures for which we give a positive answer to the questions 
(3) and (4) above. In [1] the only case considered was the class of graphs 
with single-orbit automorphisms. Here we show that the results hold also for 
regular-distance graphs and a variant of vertex-transtive graphs. 

In this paper we focus on the case in which A", y and Z are finite, leaving 
the more general case for future work. 

2 Preliminaries 

2.1 Database domain and DiflFerential privacy 

Let Ind be a finite set of individuals that may participate in a database and Val 
a finite set of possible values for the attribute of interest of these individuals. 
In order to capture in a uniform way the presence/absence of an individual in 
the database, as well as its value, we enrich the set of possible values with an 
element a representing the absence of the individual. Thus the set of all possible 
databases is the set X = v'^''' , where V = Val U {a}. We will use u and v to 
denote the cardinalities of Ind and V , \Ind\ and \ V\^ respectively. Hence we have 
that \X\ = w". A database x can be represented as a w-tuple vqVi . . . Vu-i where 
each Vi e F is the value of the corresponding individual. Two databases x, x' 
are adjacent (or neighbors), written a; ~ x', if they differ for the value of exactly 
one individual. For instance, for m = 3, Vf)ViV2 and t'oWiW2, with wi ^ ui, are 
adjacent. The structure {X , ^) forms an undirected graph. 

* A randomized function /C is oblivious if its probability distribution depends only on 
the answer to the query, and not on the database. 
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Intuitively, differential privacy is based on the idea that a randomized query 
function provides sufficient protection if the ratio between the probabilities of 
two adjacent databases to give a certain answer is bound by e*^, for some given 
e > 0. Formally: 

Definition 1 ( |12] ). A randomized function /C from X to Z satisfies e- differential 
privacy if for all pairs x,x' G X, with x ^ x' , and all S Z, we have that: 

Pr[IC{x) e 5] < X Pr[IC{x') G S] 

The above definition takes into account the possibility that Z is a continuous 
domain. In our case, since Z is finite, the probability distribution is discrete, 
and we can rewrite the property of e-d.p. more simply as (using the notation of 
conditional probabilities, and considering both quotients): 

1 Pr\Z = z\X = x\ 

— < — z ; 7 < e*^ for all x, x' G X with x ^ x', and all z G Z 

e' - Pr[Z = z\X = x']~ 

where X and Z represent the random variables associated to X and -Z, respec- 
tively. 

2.2 Information theory and application to information flow 

In the following, X, Y denote two discrete random variables with carriers X = 
{xo, . . . , x„_i}, y = {j/o, • • • , Vm-i), and probability distributions px[-), Py{-), 
respectively. An information-theoretic channel is constituted by an input X, 
an output Y, and the matrix of conditional probabilities Py\x{' \ Oi where 
PY\x{y I x) represent the probability that Y \s y given that X is x. We shall 
omit the subscripts on the probabilities when they are clear from the context. 

Renyi min-entropy In [TS], Renyi introduced an one-parameter family of en- 
tropy measures, intended as a generalization of Shannon entropy. The Renyi 
entropy of order a(Q;>0,a7^1)ofa random variable X is defined as 
Ha{X) = log2 X^a- 6 AT ^^'^ particularly interested in the limit 

of Ha as a approaches oo. This is called min-entropy. It can be proven that 

Hoo{X) lima^ooHaiX) = -\og2 max^^x Pix). 

Renyi defined also the a-generalization of other information-theoretic no- 
tions, like the KuUback-Leiblcr divergence. However, he did not define the a- 
generalization of the conditional entropy, and there is no general agreement on 
what it should be. For the case a = oo, we adopt here the definition of condi- 
tional entropy proposed by Smith in |19j : 

H^{X \Y) = - log2 V p{y) max p{x \ y) (1) 

Analogously to the Shannon case, we can define the Renyi-mutual informa- 
tion Joo as iJoo(Ar) — Hao{X I Y), and the capacity Coo as maxp^(.) Ioo{X; Y). 
It has been proven in [7] that Coo is obtained at the uniform distribution, and 
that it is equal to the sum of the maxima of each column in the channel matrix, 
i.e.. Coo = J2yey^^^^^^P(y I 
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Interpretation in terms of attacks: Renyi min-entropy can be related to a model 
of adversary who is allowed to ask exactly one question, which must be of the 
form "is X = xT' (one-try attacks). More precisely, Hoo{X) represents the (log- 
arithm of the inverse of the) probability of success for this kind of attacks and 
with the best strategy, which consists, of course, in choosing the x with the 
maximum probability. 

As for Hryo{X I y), it represents the inverse of the (expected value of the) 
probability that the same kind of adversary succeeds in guessing the value oi X a 
posteriori, i.e. after observing the result oiY . The complement of this probability 
is also known as Bayes risk. Since in general X and Y are correlated, observing Y 
increases the probability of success. Indeed we can prove formally that Hao{X \ 
y) < Hoo{X), with equality if and only if X and Y arc independent. Ioo{X; Y) 
corresponds to the ratio between the probabilities of success a priori and a 
posteriori, which is a natural notion of leakage. Note that Ioo{X; Y) > 0, which 
seems desirable for a good notion of leakage. 

3 Graph symmetries 

In this section we explore some classes of graphs that allow us to derive a strict 
correspondence between e-d.p. and the a posteriori entropy of the input. 

Let us first recall some basic notions. Given a graph G = (V, ~), the distance 
d{v, w) between two vertices v,w €V is the number of edges in a shortest path 
connecting them. The diameter of G is the maximum distance between any two 
vertices in V. The degree of a vertex is the number of edges incident to it. G is 
called regular if every vertex has the same degree. A regular graph with vertices 
of degree k is called a fc-regular graph. An automorphism of G is a permutation 
(T of the vertex set X, such that for any pair of vertices x,x', if a; ^ x', then 
it(x) ^ cr(x'). If a is an automorphism, and v a vertex, the orbit of v under a is 
the set {v,a{v), . . . ,a''~^{v)} where k is the smallest positive integer such that 
a'^{v) = V. Clearly, the orbits of the vertices under a define a partition of V. 

The following two definition introduce the classes of graphs that we are in- 
terested in. The first class is well known in literature. 

Definition 2. Given a graph G = (V, ^), we say that G is distance-regular if 
there exist integers bi,Ci,i — 0,...,d such that for any two vertices v,w in V 
with distance i = d{v,w), there are exactly Ci neighbors of w in Gi^i{x) and 
hi neighbors of v in G,;+i(a;), where Gi{x) is the set of vertices y of G with 
d{x,y) = i. 

Some examples of distance-regular graphs are illustrated in Figure [1] 
The next class is a variant of the VT (vertex-transitive) class: 

Definition 3. A graph G = (V, ~) is VT^ (vertex-transitive +) if there are 
n automorphisms ao, ai, ...a„-i, where n = \V\, such that, for every vertex 
V €V, we have that {(Ti{v) |0<i<n — 1} = V. 
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(a) Tctrahcdral (b) Cubical graph (c) Petersen 

graph graph 



Fig. 1. Some distance-regular graphs with degree 3. 



In particular, the graphs for which there exists an automorphism a which in- 
duces only one orbit are VT"*": in fact it is sufficient to define cji = ct' for all i from 
to 7i — 1. Figure m illustrates some graphs with a single-orbit automorphism. 




(a) Cycle: degree 2. (b) Degree 4. (c) Clique: degree 5. 



Fig. 2. Some VT+ graphs 



From graph theory wc know that neither of the two classes subsumes the 
other. They have however a non-empty intersection, which contains in particular 
all the structures of the form {y^'^''- ^ r^')^ i.e. the database domains. 

Proposition 1. The structure {X,^) — is both a distance-regular 

graph and a VT^ graph. 

Figure [3] illustrates some examples of structures {V^"'^,^). Note that when 
\Ind\ = n and \V\ = 2, [V^'^'^,^) is the n-dimentional hypercube. 

The situation is summarized in Figure |4l We remark that in general the 
graphs (y^"**^ ~) do not have a single-orbit automorphism. The only exceptions 
are the two simplest structures (|y| = 2, \Ind\ < 2). 

The two symmetry classes defined above, distance- regular and VT+, will be 
used in the next section to transform a generic channel matrix into a matrix 
with a symmetric structure, while preserving the a posteriori min entropy and 
the e-d.p.. This is the core of our technique to establish the relation between 
differential privacy and quantitive information flow, depending on the structure 
induced by the database adjacency relation. 
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ability sake we show only part of the 

graph) 



Fig. 3. Some (V^"^ ~) graphs 




Fig. 4. Venn diagram for the classes of graphs considered in this section. Here, 5** 
^ylnd I |y| ^ 2, \Ind\ < 2} 



4 Deriving the relation between differential privacy and 
QIF on the basis of the graph structure 

This section contains the main technical contribution of the paper: a general 
technique for determining the relation between e-difFerential privacy and leakage, 
and between e-differential privacy and utility, depending on the graph structure 
induced by r-j and /. The idea is to use the symmetries of the graph structure 
to transform the channel matrix into an equivalent matrix with certain regular- 
ities, which allow to establish the link between e-differential privacy and the a 
posteriori min entropy. 

Let us illustrate briefly this transformation. Consider a channel whose matrix 
M has at least as many columns as rows. First, we transform M into a matrix 
M' in which each of the first n columns has a maximum in the diagonal, and 
the remaining columns are all O's. Second, under the assumption that the input 
domain is distance- regular or VT+, we transform M' into a matrix M" whose 
diagonal elements are all the same, and coincide with the maximum element of 
M", which we denote here by max^^ . These steps are illustrated in Figure [H 
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(dist-reg) 



M" 



0...0 
0...0 



max^{^i ... 



I Lemma 3 
• (VT++) 



... 
... 



... max''^" ... 
Fig. 5. Matrix transformations for distance-regular and VT+ graphs 



We are now going to present formally our the technique. Let us first fix 
some notation: In the rest of this section we consider channels with input A and 
output B, with carriers A and B respectively, and we assume that the probability 
distribution of A is uniform. Furthermore, we assume that |^| ~ n < \B\ ~ m. 
Wc also assume an adjacency relation ~ on A, i.e. that (A, ~) is an undirected 
graph structure. With a slight abuse of notation, wc will also write i h when 
i and h arc associated to adjacent elements of A^ and wc will write h) to 
denote the distance between the elements of A associated to i and h. 

We note that a channel matrix M satisfies e-d.p. if for each column j and for 
each pair of rows i and h such that i ^ h we have that: 

1 M, 



< < e 



The a posteriori entropy of a channel with matrix M will be denoted by (^1^)- 
Next Lemma is relative to the first step of the transformation. 

Lemma 1. Consider a channel with matrix M . Assume that M satisfies e-d.p.. 
Then it is possible to transform M into a matrix M' such that: 

— Each of the first n columns has a maximum in the diagonal, i.e. M[ ^ = 



max. 



M' 



max/i A/^ j for each i from to n — 1. 



— The rest of the columns contain only 's, i.e. M[ ^ = for each i from to 
n — 1 and each j from n to m — 1. 

— A/' satisfies e-d.p. 
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— h:^'{a\b)^h'Mb). 

Next lemma is relative to the second step of the transformation, for the case 
of distance-regular graphs. 

Lemma 2. Consider a channel with matrix M' . Assume that M' satisfies e-d.p., 
and the first n columns have maxima in the diagonal, and the rest of the columns 
contain only 's. Assume that {A., ~) is distance-regular. Then it is possible to 
transform M' into a matrix M" such that: 

— The elements of the diagonal are all the same, and are equal to the maximum 
of the matrix, i.e. = max^^ = max^^^ for each i from to n — I. 

— The rest of the columns contain only 's. 

— M" satisfies e-d.p. 
H^"{A\B)=H^'{A\B). 

Next lemma is relative to the second step of the transformation, for the case 
of VT+ graphs. 

Lemma 3. Consider a channel with matrix M' satisfying the assumptions of 
Lemma except for the assumption about distance-regularity, which we replace 
by the assumption that {A,^) is VT^ . Then it is possible to transform M' into 
a matrix M" with the same properties as in Lemma\^ 

Note that the fact that in A/" the diagonal elements are all equal to the maximum 
max^-'^" implies that H^" {A\B) = max^^". 

Once we have a matrix with the properties of M" , we can use again the graph 
structure of A to determine a bound on {A\B). 

First we note that the property of e-d.p. induces a relation between the ratio 
of elements at any distance: 

Remark 1. Let M be a matrix satisfying e-d.p.. Then, for any column j, and 
any pair of rows i and h we have that: 

1 < ^'.i < g<id(i,h) 



In particular, if we know that the diagonal elements of M are equal to the 
maximum element max*^, then for each element M,- we have that: 



max . , 

A'/., > (2) 



Let us fix a row, say row r. For each distance d from to the diameter of 
the graph, let Ud be the number of elements Mrj that are at distance d from 
the corresponding diagonal element Mjj, i.e. such that d{r,j) = d. (Clearly, Ud 
depends on the structure of the graph.) Since the elements of the row i represent 
a probability distribution, we obtain the following dis-equation: 
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niax*^ y ^ < 1 

d 

from which we derive immediately a bound on the min a-posteriori entropy. 
Putting together all the steps of this section, we obtain our main resuh. 

Theorem 1. Consider a matrix M , and let r be a row of M . Assume that {A, ■~) 
is either distance-regular or VT^ , and that M satisfies e-d.p. For each distance 
d from to the diameter of {A, let Ud he the number of nodes j at distance 
d from r. Then we have that: 



H'MB)> -log,— ^ (3) 

/-^ pUd 



Note that this bound is tight, in the sense that we can build a matrix for which 
([3]) holds with equality. It is sufficient to define each element Mij according to 
([2]) (with equality instead of dis-cquality, of course) . 

In the next section, we will see how to use this theorem for establishing a 
bound on the leakage and on the utility. 



5 Application to leakage 

As already hinted in the introduction, wc can regard /C as a channel with input X 
and output Z . From Proposition [T]wc know that {X , ~) is both distance-regular 
and VT"'' , we can therefore apply Theorem [1] Let us fix a particular database 
X £ X . The number of databases at distance d from x is 



nd=C^{v~lf (4) 



where u = \Ind\ and v = V . In fact, recall that x can be represented as a it-tuple 
with values in V. We need to select d individuals in the u-tuple and then change 
their values, and each of them can be changed in u — 1 different ways. 

Using the from (j4]) in Theorem [1] wc obtain a binomial expansion in the 
denominator, namely: 

H"(XIZ) > - log, ^^^^ = log, f , 



d=0 

which gives the following result: 

Theorem 2. // /C satisfies e-d.p., then for the uniform input distribution the 
information leakage is bound from above as follows: 

I^iX;Z) <u log, ■ ""^^ 



V 
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K, (e-Dijf. priv. randomized function ) 
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Fig. 6. Schema of an oblivious randomized function 



We consider now the leakage for a single individual. Let us fix a database x, 
and a particular individual i in Ind. The possible ways in which we can change 
the value of z in x are v — 1. All the new databases obtained in this way are 
adjacent to each other, i.e. the graph structure associated to the input is a clique 
of V nodes. Therefore we obtain = 1 for d = 0, Ud ~ v — 1 for d = 1, and 
Ud = otherwise. By substituting this value of Ud in Theorem [1] we get 

ind/" ~< - . 1 . 



HT{ Val\Z) > - log, ^-^ = - log2 



1 + 



V + 



which leads to the following result: 

Proposition 2. Assume that JC satisfies e-d.p.. Then for the uniform distri- 
bution on V the information leakage for an individual is hound from above as 
follows: 



IZ^iVal- B)< log, ■ 



V e 



v-l + e" 

Note that the bound on the leakage for an individual does not depend on the 
size of Ind, nor on the database x that we fix. 



6 Application to utility 

We turn now our attention to the issue of utility. We focus on the case in which 
/C is oblivious, which means that it depends only on the (exact) answer to the 
query, i.e. on the value of f{x), and not on x. 

An oblivious function can be decomposed in the concatenation of two chan- 
nels, one representing the function /, and the other representing the randomiza- 
tion mechanism H added as output perturbation. The situation is illustrated in 
Figure [HI 

The standard way to define utility is by means of guess and gain functions. 
The functionality of the first is guess : Z y, and it represents the user's 
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strategy to retrieve the correct answer form the reported one. The functionahty 
of the latter is gain : y x y ^ R. the value gain{y, y') represents the reward for 
guessing the answer y when the correct answer is y' . The utility lA can then be 
defined as the expected gain: 

U{y,Z) = ^p{y, z) gain{guess{z),y) 

Wc focus here on the so-called binary gain function, which is defined as 

gam{y,y) = < _ ^, 

I otherwise 

This kind of function represents the case in which there is no reason to prefer 
an answer over the other, except if it is the right answer. More precisely, we get 
a gain if and only if we guess the right answer. 

If the gain function is binary, and the guess function represents the user's 
best strategy, i.e. it is chosen to optimize utility, then there is a well-known 
correspondence between Li and the Baycs risk / the a posteriori min entropy. 
Such correspondence is expressed by the following proposition: 

Proposition 3. Assume that gain is binary and guess is optimal. Then: 
U(Y,Z) = Vmax(p(z|y)p(j/)) = 2-"°^'-^^^^ 

y 

Z 

In order to analyze the implications of the e-d.p. requirement on the utility, 
wc need to consider the structure that the adjacency relation induces on y. Let 
us define ^ on y as follows: y ^ y' ii there are x,x' G X such that y — /(a;), 
y' = f{x'), and x ^ x,. Note that K, satisfies e-d.p. if and only if H satisfies 
e-d.p. 

If {y, ~) is distance- regular or VT+, then we can apply Theorem [T] to find a 
bound on the utility. In the following, wc assume that the distribution of Y is 
uniform. 

Theorem 3. Consider a randomized mechanism Ti, and let y be an element of 
y. Assume that (y, ^) is either distance-regular or VT^ and that % satisfies 
e-d.p. For each distance d from to the diameter of {y, '^), let be the number 
of nodes y' at distance d from y. Then we have that: 

U{Y,Z)<-^ (5) 

d ^ 

The above bound is tight, in the sense that (provided (3^, ^) is distance- 
regular or VT+) we can construct a mechanism % which satisfies with equal- 
ity. More precisely, define 

1 

/ ^ ge d 
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Then define T-L (here identified with its channel matrix for simplicity) as follows: 



e 



Theorem 4. Assume (y,^) is distance-regular or VT^ . Then the matrix % 
defined in ^ satisfies e-d.p. and has maximal utility: 

U{Y, Z) ^ 



gg d 



Note that we can always define "H as in ([6]): the matrix so defined will be 
a legal channel matrix, and it will satisfy e-d.p.. However, if (3^, ~) is neither 
distance-regular nor VT+, then the utility of such TL is not necessarily optimal. 

We end this section with an example (borrowed from P]) to illustrate our 
technique. 

Example 1. Consider a database with electoral information where each row cor- 
responds to a voter and contains the following three fields: 

— Id: a unique (anonymized) identifier assigned to each voter; 

— City: the name of the city where the user voted; 

— Candidate: the name of the candidate the user voted for. 

Consider the query "What is the city with the greatest number of votes for 
a given candidate cand?". For such a query the binary utility function is the 
natural choice: only the right city gives some gain, and all wrong answers are 
equally bad. It is easy to see that every two answers are neighbors, i.e. the graph 
structure of the answers is a clique. 

Let us consider the scenario where City = {A, B, C, D, E, F} and assume for 
simplicity that there is a unique answer for the query, i.e., there arc no two cities 
with exactly the same number of individuals voting for candidate cand. Table [T] 
shows two alternative mechanisms providing e-difFerential privacy (with e = 
log 2). The first one. Mi, is based on the truncated geometric mechanism method 
used in |14j for counting queries (here extended to the case where every pair of 
answers is neighbor). The second mechanism, M2, is obtained by applying the 
definition From Theorem 2] we know that for the uniform input distribution 
M2 gives optimal utility. 

For the uniform input distribution, it is easy to see that U{Mi) = 0.2242 < 
0.2857 = IA{M2). Even for non-uniform distributions, our mechanism still pro- 
vides better utility. For instance, for p{A) — p{E) = 1/10 and p{B) = p{C) = 
p{D) = P{E) = 1/5, we have U{Mi) = 0.2412 < 0.2857 = ^(A/s). This is not 
too surprising: the geometric mechanism, as well as the Laplacian mechanism 
proposed by Dwork, perform very well when the domain of answers is provided 
with a metric and the utility function is not binarjU. It also works well when 



^ In the metric case the gain function can take into account the proximity of the 
reported answer to the real one, the idea being that a close answer, even if wrong, 
is better than a distant one. 
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(a) Mi: truncated geometric mechanism 



In/Out 


A 


B 


C 


D 


E 


F 


A 


0.535 


0.060 


0.052 


0.046 


0.040 


0.267 


B 


0.465 


0.069 


0.060 


0.053 


0.046 


0.307 


C 


0.405 


0.060 


0.069 


0.060 


0.053 


0.353 


D 


0.353 


0.053 


0.060 


0.069 


0.060 


0.405 


E 


0.307 


0.046 


0.053 


0.060 


0.069 


0.465 


F 


0.267 


0.040 


0.046 


0.052 


0.060 


0.535 



(b) M2: our mechanism 



In/Out 


A 


B 


C 


D 


E 


F 


A 


2/7 


1/7 


1/7 


1/7 


1/7 


1/7 


B 


1/7 


2/7 


1/7 


1/7 


1/7 


1/7 


C 


1/7 


1/7 


2/7 


1/7 


1/7 


1/7 


D 


1/7 


1/7 


1/7 


2/7 


1/7 


1/7 


E 


1/7 


1/7 


1/7 


1/7 


2/7 


1/7 


F 


1/7 


1/7 


1/7 


1/7 


1/7 


2/7 



Table 1. Mechanisms for the city with higher number of votes for candidate cand 



{y, ^) has low connectivity, in particular in the eases of a ring and of a line. But 
in this example, we are not in these eases, because we are considering binary 
gain functions and high connectivity. 

7 Related work 

As far as we know, the first work to investigate the relation between differential 
privacy and information-theoretic leakage for an individual was [5] • In this work, 
a channel is relative to a given database x, and the channel inputs are all possible 
databases adjacent to x. Two bounds on leakage were presented, one for teh 
Renyi min entropy, and one for Shannon entropy. Our bound in Proposition [2] 
is an improvement with respect to the (Renyi min entropy) bound in [2]. 

Barthe and Kopf [4] were the first to investigates the (more challenging) 
connection between differential privacy and the Renyi min-entropy leakage for 
the entire universe of possible databases. They consider the "end-to-end differ- 
entially private mechanisms" , which correspond to what we call IC in our paper, 
and propose, like we do, to interpret them as information-theoretic channels. 
They provide a bound for the leakage, but point out that it is not tight in gen- 
eral, and show that there cannot be a domain-independent bound, by proving 
that for any number of individual u the optimal bound must be at least a cer- 
tain expression /(m, e). Finally, they show that the question of providing optimal 
upper bounds for the leakage of e-differentially private randomized functions in 
terms of rational functions of e is decidable, and leave the actual function as an 
open question. In our work we used rather different techniques and found (inde- 
pendently) the same function f(u, e) (the bound in Theorem[T|), but we actually 
proved that /(m, e) is the optimal bouncj^. Another difference is that [4] captures 
the case in which the focus of differential privacy is on hiding participation of 
individuals in a database. In our work, we consider both the participation and 
the values of the participants. 

Clarkson and Schneider also considered differential privacy as a case study 
of their proposal for quantification of integrity [9|. There, the authors analyze 

® When discussing our result with Barthe and Kopf, they said that they also conjec- 
tured that f{u, e) is the optimal bound. 
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database privacy conditions from the literature (such as differential privacy, k- 
anonymity, and ^-diversity) using their framework for utility quantification. In 
particular, they study the relationship between differential privacy and a notion 
of leakage (which is different from ours - in particular their definition is based 
on Shannon entropy) and they provide a tight bound on leakage. 

Heusser and Malacaria |15j were among the first to explore the application 
of information-theoretic concepts to databases queries. They proposed to model 
database queries as programs, which allows for statical analysis of the informa- 
tion leaked by the query. However [T3] did not attempt to relate information 
leakage to differential privacy. 

In [14j the authors aim at obtaining optimal-utility randomization mecha- 
nisms while preserving differential privacy. The authors propose adding noise to 
the output of the query according to the geometric mechanism. Their frame- 
work is very interesting in the sense it provides a general definition of utility 
for a mechanism M that captures any possible side information and preference 
(defined as a loss function) the users of M may have. They prove that the ge- 
ometric mechanism is optimal in the particular case of counting queries. Our 
results in Section [5] do not restrict to counting queries, but on the other hand 
we only consider the case of binary loss function. 

8 Conclusion and future work 

In this paper we have investigated the relation between e-differential privacy and 
leakage, and between e-differential privacy and utility. Our main contribution is 
the development of a general technique for determining these relations depending 
on the graph structure induced by the adjacency relation and by the query. We 
have considered two particular structures, the distance-regular graphs, and the 
VT+ graphs, which allow to obtain tight bounds on the leakage and on the utility, 
and to construct the optimal randomization mechanism satisfying e-differential 
privacy. 

As future work, we plan to extend our result to other kinds of utility func- 
tions. In particular, we are interested in the case in which the the answer domain 
is provided with a metric, and we are interested in taking into account the degree 
of accuracy of the inferred answer. 
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/C (e-Diff. priv. randomized function) 
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Utility 



Leakage 



Fig. 1. Model of leakage and utility 
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Fig. 2. Matrix transformations for distance-regular and VT-I— I- graphs 
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bbbb 



bbaa 
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Fig. 3. Hypercube 
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Fig. 4. A VT++ graph without a single-orbit automorphism 




Fig. 6. Partial view of the V^"'* graph 



3 




Fig. 7. Venn diagram for the types of graphs, where S* = {V'"'' | \V\ = 2, \Ind\ < 2} 




(a) Tetrahedral graph (b) Cubical graph (c) Petersen graph 



Fig. 8. Some distance-regular graphs 



